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EDUCATIONAL PSYCHOLOGY 


BY V. A. C. HENMON 
Yale University 


The problem of what to include and what to exclude under the 
general title of educational psychology is a troublesome one. This 
bibliography for April, 1926, to April, 1927, lists general texts and 
manuals in educatiunal psychology and references on the general 
psychology of learning, the special psychology of school subjects, 
the pre-school child and exceptional children. References on intelli- 
gence tests, statistical methods, educational tests, and tests of person- 
ality and character are not included though they are usually included 
under the general heading and occupy most of the pages in journals 
devoted to educational psychology. Other bibliographical reviews in 
this BULLETIN cover these fields. 

1. General Texts. The effect of changing and conflicting concep- 
tions in psychology is reflected in various studies of what shall be 
included in courses for teachers by Peterson and Dunkle (128), Wat- 
son (165), and Worcester (183), and by the diverse character of 
general texts. “ Gestalt” psychology in its applications to education 
is represented primarily by the text of Ogden (122) and by the 
reviews and interpretations of Frank (40) and Gates (45). Conven- 
tional texts for class use include Benson, Lough, Skinner, and 
West (8), Sturt and Oakden (196), and Woodburne (177). Special 
mention should be made of the collection of readings in educational 
psychology by Skinner, Gast and Skinner (141). During the year 
no less than five books and monographs on how to study effectively 
have been published by Book (12), Butterweek (22), Crawford (28), 
Fenton (39), and Headley (61), and in addition there have been 
various special studies. 
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2. General Psychology of Learning. There have been the usual 
studies of the various types of learning and conditions which affect 
them. Notable is the falling off in studies of mental work, fatigue, 
and transference of training. Studies of the learning processes in 
very young children are reported by Jones (77) and Kirkpatrick (89). 
Play activities and a critical study of periodicity in play appear in 
LaSalle (94), Lehmann (96, 97, 98, 99, 100), Lehmann and Witty 
(101), and Witty and Lehmann (175). 

3. Psychology of School Subjects. The very great interest in the 
psychology and pedagogy in reading is shown by the books by 
Brooks (16), Klapper (90), O’Brien (121), and Terman and 
Lima (151). The annotated bibliographies and summaries of read- 
ing investigations by Gray (55, 56) indicate great activity in this field. 
Special attention should be called perhaps to the studies of vocabulary 
and vocabulary building by Horn (71), Lebeis (95), Nice (119), 
Smith (144), and Symonds (147), and to the studies on the develop- 
ment of sentence structure and of language by Boyd (14), Piaget 
(132), and Smith (144). Comparative studies of reading ability 
under different methods and the effect of remedial teaching are too 
numerous for separate listing. 

In the field of arithmetic there is a single general text on methods 
of teaching by Newcomb (118) and two notable monographs by 
Buswell and John (20) and Judd (82). An annotated bibliography 
of arithmetic investigations is given by Buswell (21). Special studies 
in arithmetic, algebra, and geometry are numerous. 

Problems in the learning and teaching of writing have been stud- 
ied by Hertzberg (64,65), West (168), and Winch (172, 173). 

4. Pre-School Child. Interest in this field continues unabated to 
judge by the numerous papers on the nursery school and the develop- 
ment of behavior patterns in very young children. The general dis- 
cussion of the psychology of the kindergarten-primary child by 
Pechstein and Jenkins (127) perhaps falls in this field. 

5. Exceptional Children, The year has produced two books deal- 
ing with the gifted child—the second volume in the Stanford Genetic 
Studies of Genius by Cox (27) and the text by Hollingworth (67). 
A bibliography on the gifted child is given by Jensen (76). Special 
studies are numerous. Contrasting characteristics of bright and dull 
and Brown (17). The teaching of the dull and retarded children is 
treated fully by Inskeep (75). Interesting features of the study of 
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the feebleminded, the psychopathic, and the problem child are case 
studies exemplified by those of Kelly (87) and Woolley (182). 

Special attention should be called to critical studies of radiographs 
by Carter (23) and McElhannon (105), to the studies of the effect 
of size of class by Harlan (59) and Metzner and Berry (112), to the 
investigations on the effect of the summer vacation on abilities by ¥ 
Elder (38) and Noonan (120) and to the general survey of mental 
growth and decline by Hollingworth (66). 
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General. Several books in the general field of mental measure- i 
‘ ? 7 
ment have recently appeared, the most notable of which is that by “k 


Thorndike (129). In this book we have for the first time the con- 
struction of a measuring scale of equal intervals from a defined zero, 
the lack of which has caused much confusion in mental measurement. 
In addition to this Thorndike has corrected the scores of several 
well-known intelligence tests, discussed in detail the distribution of nq 
intelligence ratings and the probable growth of intelligence and given a 
the best description of intelligence with reference to such attributes : 
as altitude, range, area, and speed. It is the most significant con- 
tribution to our subject since the work of Binet. Freeman’s (43) iq 
book covers the whole range of mental tests with primary emphasis ‘ 
upon intelligence tests and he discusses the problems of growth, 
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t 
variability, and the general nature of intelligence. Claparéde (23) a 
also gives a general description of theories of intelligence, and makes 4 
a distinction between age tests and tests of aptitude. His book a 
includes a translation of the Stanford Revision and also contains some a 
foreign tests not well known in this country. Lincoln (77) deals E 
primarily with educational measurement, but includes an elementary ‘4 
account of intelligence testing. i 
With reference to the definition of intelligence, Warren et al. 4 
(140) give a formal definition among their list of psychological 
terms. Pintner (106) presents an empirical view of intelligence as 
an evaluation of any reaction, and discusses its bearing upon the ; 
interpretation of test results. Freeman (42) proposes a new defini- i 
tion. Kelley (69) discusses the possibilities of mental types which a 
diverge from the average, and defines the characteristic of a type de 
trait. He finds three types, the social service, the dominant, and the 


equilibrium types. 

Of historical interest we have the monograph by Martin (81) on 
Alfred Binet, attempting an evaluation of his life’s work. 

The relation between speed and intelligence is discussed by Peak 
and Boring (100) and also by Sisk (120). The relation between 
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habitual or informational tasks and power or novel tasks is studied 
by Tilton (131), who finds a high correlation among school children 
and concludes that the former are as adequate measures of intelli- 
gence as are the latter. Orleans (98) attempts an analysis of the 
nature of difficulty of a test item, gives seven characteristics of diffi- 
culty, and finds that such characteristics as complexity, obviousness 
of solution, and familiarity are most highly correlated with difficulty. 
Hamid (51) attempts a similar qualitative analysis of test items which 
differentiate best between dull and bright children, and finds such 
qualities as complexity, abstractness, etc., to be of diagnostic value, 
while novelty of fundaments is poor. Slocombe (121) applies the 
intellective saturation formula of Spearman to nine group tests and 
as a result discusses ways in which tests might be improved. 
Freyd (44) finds the general intelligence of the socially and mechani- 
cally minded to be about the same. 

In the general field of statistical technic as applied to intelligence 
measurement we have the work of Otis (99), who also discusses the 
problem of the growth of mental ability. The interpretation of 
Burt’s famous regression equation is still continued by Thom- 
son (127). Anderson (4) ‘finds no advantage of sigma weighting 
over raw scores for the Kuhlmann tests. Thurstone (130) criticizes 
severely the ordinary concept of mental age and advises its abandon- 
ment in favor of percentile or sigma ratings. Heinis (53) criticizes 
the I.Q. and develops a formula for a different index to take its place. 
Dvorak (35) also subjects the I.Q. to criticism and shows the relation 
between change in I.Q. and growth of intelligence. Willson (145) 
studies the variability by age and by grade of many different tests. 
Piéron (103) shows that the discriminative index proposed by 
Claparéde to differentiate between tests of age and tests of aptitude 
would vary with the age under consideration. 

The influences of heredity and environment are most ably dis- 
cussed by Kelley (68), who concludes that gifted children tend to be 
levelled up in all school abilities as they grow older. Pressure is 
brought to bear to eliminate idiosyncrasies in most school subjects. 
Courtis (25 and 26), after an elaborate study, concludes that the 
level of development is fixed by hereditary factors, while training 
contributes a very small amount in addition. Wechsler (141) pre- 


sents coefficients of variability at various ages and argues that educa- 
tion is now making children more alike. Jones (65) measures the 
effect of age and experience on many types of intelligence test mate- 
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rial and concludes that sentence completion, analogies, information, 
etc., are least affected by age and experience. Brooks (15) raises the 
practical question as to the increase in accuracy for sectioning in the 
junior high school by combining two tests. He finds great variation 
among single tests as well as among pairs of tests. 

Dearborn (31) outlines an extensive program for repeated meas- 
urements of children to determine constancy of I.Q. and growth of 
intelligence. Broom (16) gives the results of retests of the Terman 
Group Test for 50 high school seniors, showing a correlation of .86. 
Nettles (95) with the same test for 130 high school students finds a 
correlation of .85 after a three-year interval. Bowie and Laws (11) 
repeated the Northumberland Test after an interval of six months 
and obtained a correlation of £87. Wentworth (142) gives the re- 
sults of one thousand Binets in the first grade. Retests of 145 cases 
gave a correlation of .82. Reasons for large differences in I1.Q. are 
discussed. The study also includes 112 case studies of different 
types of children. Hurlock (62) finds that either praise or reproof 
will raise the I.Q. on retests as contrasted with a control group. 
Woolley (147) finds a general increase in mental tests from ages four- 
teen to eighteen, and a tendency for the mental growth of the superior 
children to continue longer than that of the inferior. She finds no dif- 
ference between the sexes, a correlation between mental and physical 
tests of about .4, practically no correlation with earning capacity and 
little correlation with home ratings. Thorndike (128) discusses the 
gain in score on I.E.R. tests from thirteen to nineteen, finds the gain 
from thirteen to sixteen fairly steady and concludes that the doctrine 
of growth ceasing at fourteen or sixteen should be abandoned. Arthur 
(7) studies the I.Q.’s of 92 pairs of foreign siblings and finds a dif- 
ference of about seven points in favor of the younger sibling, 
whereas there is no such difference with similar American pairs. 
She raises the question of possible physiological changes or environ- 
mental changes. Gilmore (46) reports a superior gain of about ten 
per cent of score on the Otis test by a coached group as contrasted 
with an uncoached group of children. Chapman (20) finds con- 
siderable gain from coaching on Alpha but no such gain after coach- 
ing on Burt’s Reasoning Tests. 

Scales. The Merrill-Palmer scale for children of pre-school age 
is described by Stutsman (126), who gives directions for giving the 
tests as well as percentiles for children from 18 to 66 months. Most 


of the tests are of the performance type, but there are some verbal 
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tests as well. The growing interest in performance tests is also seen 
in the scale of fourteen tests proposed by Squires (123). This seems 
a successful attempt to construct performance tests for high levels 
of intelligence. Each test is carefully described and the method of 
application has been determined. As yet there are no norms. The 
ability to draw a man has been scaled by Goodenough (48) who 
claims it to be a good measure of intelligence for children. It corre- 
lates with Binet .76, with school success .33 and with teacher ratings 
of intelligence from .56 to .86. An elaborate study of the Pintner- 
Paterson Performance Scale with delinquents is reported by 
Aden (2). 

The correlations of each test separately with the Binet are re- 
ported, but no correlations for the total performance scale. Com- 
parison is also made with the Stenquist mechanical tests. Another 
study of miscellaneous performance tests is reported by Worthing- 
ton (148). Correlations with the Stanford-Binet for individual tests 
range from .41 to .79 for a wide range of intelligence. In French 
literature there have appeared the tests for young children by Des- 
coeudres (33) and the Scale of Vermeylen (136). Descoeudres de- 
scribes a great number of verbal and performance tests and gives 
tentative standards. The tests can be arranged into a sort of scale. 
Vermeylen gives fifteen individual tests the results of which are re- 
ported on a psychograph for each child, allowing, as he claims, a 
better analysis of the several different aspects of intelligence. 

Group Tests. Very few new group tests have appeared recently. 
The most notable addition is the Multi-Mental Test of McCall (82). 
It is a verbal test requiring reactions to words in different relation- 
ships. The scoring is determined by the frequency of responses 
obtained. The validity of each item on the test has been computed. 
In grades III to VIII it correlates .93 with a good composite criterion, 
89 with Binet, and .93 with the N.I.T. The test will undoubtedly 
prove a valuable addition to the intelligence tests now available. 
Arthur (6) proposes a group test made up of some of the common 
test material such as opposites, completion, etc. 

With reference to group tests already published, we have the 
work of Dale (29) who has adapted Burt’s reasoning tests, con- 
structed three forms and given them to English and American chil- 
dren, finding similar results in the two countries. Reliabilities and 
validities have been calculated. Pintner (107) reports new norms 
for the Pintner-Cunningham Primary Test based upon 29,000 cases, 
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as well as correlations with Binet (from .60 to .88) and with other 
group tests (from .67 to .79). He also attempts a geographical dis- 
tribution of his results. Bregman (13) reports useful percentile 
equivalents for Alpha scores based on probable distribution of in- 
telligence in the entire population. These will be useful in interpret- 
ing the scores made by adults on the test. 

In French we find several reports of single tests, not combined 
into a battery. Pieron (104) studies in detail six tests, some not 
known in America, and reports intercorrelations. Piéron (105) also 
studies in detail a number-series completion test, giving a rank 
order of difficulty for each item. There is little gain by repetition 
of the test two or three times, Christiaens (22) reports a novel form- 
series completion test constructed by the Russian Dounaiewsky. 
Decroly (32) gives results from a wide testing with a French trans- 
lation of the Thurstone Test and he also presents a translation of 
sallard’s Test. In German Kliiver (71) gives a description of many 
tests suitable for first grade children, but they are not combined into 
a battery. 

School Children. A yearbook of the National Society for the 
Study of Education (94) is devoted to the discussion of adapting 
the schools to individual differences and naturally includes many ref- 
erences to intelligence tests as means of measuring individual differ- 
ences. Young (150) gives distributions for the N.I.T. and Binet by 
age and grade and advocates a percentile scheme of scoring. Torger- 
son (134) shows how classification according to intelligence reduces 
the number of failures, and that the low A.Q.’s of high I.Q. cases are 
often caused by faulty grade placement. O’Flaherty (97) gives the 
distribution of the 1.Q.’s on group tests of a large number of children 
over age for their grade. Over 50 per cent have I.Q.’s between 
71 and 90. Strachan (125) gives distributions for 22,000 primary 
children on the Stanford-Binet. The white median is about 100 and 
the colored about 92. Wagoner (138) investigates the constructive 
ability of young children, and Viele (137) gives the results of four 
primary group tests in grades I and II. 

Two reports deal with the intelligence of private school pupils. 
Rogers (113) gives a summary of the I.Q.’s of 2,676 pupils in differ- 
ent schools tested by different tests. The median I.Q. is 114, and 
there are only 14 per cent below I1.Q. 100 and as many as 16 per 
cent above I.Q. 130. Jones (63) reports Army Alpha results in 
grades 9 to 12, showing that a median score for grade 10 in this 
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preparatory school is about equal to the median for senior college 
students in general. 

The general use of tests in high school is discussed at length by 
Flemming (41). In all 27 variables are correlated with teachers’ 
marks. A detailed study of the Terman Group, the Otis Self-Ad- 
ministering and the Miller Tests is made. Pickell (102) discusses 
in general the procedure involved in grouping according to ability 
in the junior high school, while Keener (67) gives the results of 
many Otis Classification Tests used in the actual classification of 
children, and shows the greater achievement as measured by the 
Stanford Achievement Test of the brighter groups. Hughes (58) 
gives results on the Otis S.A. Test for 29 schools of different sizes 
and shows that the larger schools in general have the higher I.Q.’s. 

Stainer (124) shows the use of intelligence tests in a city in Eng- 
land for the purpose of choosing children for scholarships for higher 
education. An I.Q. of 110 on the Northumberland Test is the limit 
below which a child is not likely to make good on the scholastic part 
of the examination. 

The Feebleminded. Pressey and Pressey (111) in their general 
book on mental abnormality discuss the relationship between I.Q. and 
the diagnosis of feeblemindedness. Any I.Q. below 70 suggests 
feeblemindedness. The I.Q.’s of a miscellaneous group coming to a 
psychological clinic are given by Merrill (85), who report 33 per cent 
of such cases below 1.Q. 70. Jones (64) reports the phenomenal 
memory of an adult with an I.Q. on the Stanford of 74 or 79. In 
the past this case would have been called an idiot savant, but the 
author shows how specialized is this memory, for on ordinary stand- 
ardized memory tests the individual falls below average adult level. 
Ley (76) gives an elaborate comparison between the normal and 
special class children of the same school district of Brussels with 
reference to hereditary and social factors. Of the normal 3 per cent, 
of the abnormal 40 per cent walked later than two years; of the 
normal 2 per cent, of the abnormal 40 per cent talked later than two 


years. 

The Superior. Hollingworth’s (55) book on gifted children is 
an excellent summary of the great amount of material that has been 
collected during the past few years. It discusses the general mental 
and physical characteristics, and gives interesting case studies of sev- 
eral children with I.Q.’s above 180. It gives many suggestions with 
reference to the curriculum and schooling for superior children. Cox 
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(27) brings together the boyhood records of 300 eminent men who 
were born between 1450 and 1850. From these records three ex- 
perts estimated the probable I.Q.’s. These range from 100 to 200 
with a mean about 135. The belief that many eminent men were 
dull and foolish in childhood seems now definitely exploded. The 
case studies in Part II of the book will prove of great value for 
further research. Hollingworth and Monahan (57) show that chil- 
dren with I.Q.’s above 135 are superior in speed to unselected children 
in the Tapping Test. Such children are not especially lacking in 
motor qualities. This point is further emphasized by Monahan and 
Hollingworth (91) in their comparison of superior and average chil- 
dren in such motor activities as standing broad jump, chinning, 
strength of grip. Garrison and Pullias (45) find considerable corre- 
lation between I.Q0. and some physical measurements (grip, vital 
capacity, weight, height) for a small group of I1.Q.’s between 116 and 
170. Jones and McCall (66) find that bright children who have been 
accelerated in the grades gained more promotions in subjects in high 
school than average pupils. Bright children taught in special classes 
gain more than equally bright children in regular classes. Holling- 
worth (56) finds that bright children respond to the Seashore Musical 
Ability Tests about the same as children of like chronological age. 
College Students. The use of intelligence tests with college stu- 
dents is very great. Toops (132, 133) shows from the result of a 
questionnaire sent to 110 colleges that 60 per cent are making official! 
use of tests. He also gives a table of the uses of tests. No college 
uses them as sole basis for entrance. The Thorndike College En- 
trance Test is most commonly used. The median validity coefficient 
reported is .46, average reliability of tests .87, of college marks .66. 
Brigham et al. (14) show that over eight thousand took the first 
intelligence test given by the College Entrance Board. The reliability 
of te test is about .94. Validity can only be estimated at this time. 
There is no real difference between the mean score of the girls and 
that of the boys. Ellis (37) compares 121 students tested as fresh- 
men in 1922 and again as seniors in 1925 on the Miller Test. The 
correlation is .70. The score of those who stayed on in college is 
+.28 sigma higher than the score of those eliminated from the 
original total freshman group. Miller (89) shows that sectioning on 
the basis of intelligence tests gives marked differences for groups on 
objective examinations. Rogers (114, 115, 116) discusses various 
aspects of intelligence and college problems, such as elimination, 
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choice of studies, and success in class work. Yarborough (149) finds 
a correlation of .51 between students’ estimates of their fellow stu- 
dents and intelligence scores. Whinery (143) finds that highest 
scores are made by youngest, lowest by oldest students. The average 
C.A. of the top 5 per cent is 18.16 and of the bottom 19.52. 

Langlie (74) finds no real difference between aptitude and train- 
ing in the Iowa Placement Tests. Anderson and Spencer (3) give 
validities and reliabilities for the Yale Intelligence Test. Interest- 
ing reports of tests used in colleges are given by Murray (93), 
Miner (90), Bear (9), Guiler (50) and Arlitt and Hall (5). Hughes 
(59) asks why intelligence scores are not more highly predictive of 
college success, and Crane (28) finds that exclusion of 10 per cent 
by means of the Thurstone Test is less desirable than similar exclu- 
sion by means of an achievement test. Peters (101) proposes a 
method for arriving at A.Q.’s for high school and college by reducing 
raw scores to T scores and then dividing. He finds negative corre- 
lations from .29 to .61 between intelligence and A.Q., just as has been 
found in elementary schools. Leatherman and Doll (75) discuss 
the maladjustments of college students. Two reports dealing with 
intelligence testing in normal school are given by Madsen (79) and 
by Rich and Skinner (112). 

The Delinquent. Root (117) gives a very elaborate and detailed 
report of mental and educational tests given to 1,916 prisoners. The 
median I.Q. on the Stanford-Binet is .76. Merrill (84) finds an 
average I.Q. of 82 for 236 juvenile delinquents, and she gives distri- 
butions by offense, nationality, and the like. Maris (80) makes a 
survey of juveniles in Manitoba by means of a group test and finds 
8 per cent feebleminded and 21 per cent borderline. Dougherty (34) 
studies the mechanical and performance ability of delinquent boys 
and girls. His scores are higher than Stenquist’s norms. He be- 
lieves that some social maladjustments may be caused by too much 
stress on a literary type of training in school. 

Racial Comparisons. Lacy (73) compares 817 colored with 5,159 
white children on the Stanford-Binet in kindergarten and grades 1 
to 3. The average I.Q. of the colored drops steadily from 99 to 87, 
while the white I.Q. is almost stationary. 

The study of racial groups in this country is still arousing great 
interest. The most extensive report is by Hirsch (54). Over 5,000 
children were given group tests, the Polish Jews and Swedes rank- 
ing highest (higher than the Americans) and the Negroes and Portu- 














INTELLIGENCE TESTS 399 








guese lowest. He finds no connection between high I.Q. and Nordic 
blood, indeed his whole book is a protest against the use of the word 
“race” as applied to national groups. Kirkpatrick (70) restricts his 
study to eleven-year-olds of four nationalities, Americans, Finns, 
French Canadians, and Italians. He finds a language handicap in 
the use of the usual verbal group test, but, of course, finds marked 
differences between the groups with or without language handicap. 
The Finns are.about the same as the Americans, whereas the Italians 
and French Canadians fall very low. Sandiford and Kerr (119) 
report an extensive study with the Pintner-Paterson Performance 
Scale on Japanese and Chinese children in Vancouver. Both groups 
are about equal to the norms for the tests. The authors conclude 
that due to selection the “ Japanese are the most intelligent racial 
group in British Columbia.” Goodenough (49) gives average I.Q.’s 
for several racial groups on her scale, Jews 106, Americans 101.5, 
Italians 89, Southern Negroes 79, and many other groups. Wang 
(139) makes an analysis of tests given to Chinese, Russian and Negro 
university students in this country and concludes that the Chinese 
students are handicapped by language difficulties. Boody (10) tests 
immigrant children at Ellis Island. 

Of tests given in foreign countries we have the report of Porteus 
and Babcock (109) in the Hawaiian Islands. On a modified Binet 
and on the N.I.T. the Chinese are highest excluding the Americans ; 
on the Maze Test the Japanese are highest including the Americans ; 
on a Form Test the Hawaiian boys are highest. Carreon’s (19) 
work in the Philippines consists of Haggerty tests given to 665 
pupils in different grades. The scores are much below the norms 
for the test. 

Inheritance. Merriman’s (86) study of a great number of twin 
pairs on several different tests supports previous work in this field. 
His correlations range from .66 to .87. Averill and Mueller (8) 
show great resemblances in ten pairs of twins on several mental and 
physical tests. Muller (92) gives a detailed report of identical twins 
separated when two weeks old, who did not see each other till 
age 18. In spite of very different types of schooling, their scores at 
age 18 on Army Alpha and Otis Advanced are almost identical. The 
Pressey and Downey tests show the twins to be very unlike. 

Chapman and Wiggins (21) give I.Q.’s for families of different 
sizes. The correlation between size of family and I.Q. is negative 
33, between social status and I.Q. positive .32. Bradford (12) finds 
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a correlation of negative 25 between size of family and I.Q., and 
argues that these and other facts indicate a gradual lowering of in- 
telligence from generation to generation. The distribution of the 
intelligence of children according to the occupations of the parents is 
shown for the Isle of Wight by Macdonald (78) and for high school 
and university students in British Columbia by Sandiford (118). 
The rankings of occupations are in general agreement with what has 
been found in the United States. 

Miscellaneous. With reference to physical traits we have Heid- 
breder’s (52) finding of no correlation between intelligence and the 
height-weight ratio among university students. Abernethy (1) finds 
no correlation between M.A. and ossification, dentition or precocity 
of maturing. McHale (83) finds no real differences in intelligence 
between over-, under- and normal-weight children. Smillie and 
Spencer (122) find decrease in I.Q. with increased intensity of hook- 
worm infection. Woods (146) finds no change in I.Q. in classes 
where nutrition has been taught and practiced. Fernald and Arlitt 
(39) report results for 194 crippled children, and find that the 
crippled are only slightly more limited in intelligence than their sibs 
(r = .52), that injury occurring after school age gives a higher [.QO., 
and that I.Q. varies with type of injury. 

With reference to mental traits, Hughes (60) gives correlations 
of intelligence among high school students with quickness of thought 
42; memory, 40; personality, 37, and a lot of other traits down to 
respect for authority, 17. Earle (36) on the other hand finds no 
correlation between intelligence scores and ratings of character traits 
of 212 student nurses. Caldwell and Wellman (18) rank children 
with reference to leadership in different activities. Among this 
selected group there would not appear to be a high correlation between 
leadership and M.A. or I.Q., although the leaders are seldom in the 
lower quarter of the group. Fenton (38) tries to evaluate the effect 
of knowledge of intelligence score on normal school students. 
Darsie (30) gives suggestions as to how intelligence test results 
should be reported to teachers and parents. Brown (17) studies the 
unevenness of the abilities of dull and bright children in three dif- 
ferent types of intelligence tests. He finds the amount of unevenness 
in dull and bright groups to be the same. 

Fernald and Sullivan (40) give a survey of 1,712 policemen on 
the Army Alpha. The distribution is somewhat superior to the 
army draft. Hurlock (61) finds praise and reproof of about equal 
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Miles (88) shows negative correlations between intelligence and gains 
in reading among high school pupils, with the exception of one class 
which possessed a teacher who stimulated the bright as well as the 
slower ones in the class. Whipple (144) points out a sex difference 
in intelligence scores in favor of girls at age eleven on the N.I.T. 
and at all ages from eight to thirteen on the Illinois test. Pintner 
(108) shows a wide range in scores obtained by different examiners 
scoring the same N.I.T. paper due to errors and ambiguity in direc- 
tions for scoring. Obrien (96) finds no difference between seven 
and eight month schools in achievement or intelligence and no cor- 
relation between I.Q. and attendance. Pressey (110) describes an 
apparatus for testing and scoring. Kohn-Schachter and Weigel (72) 
have adapted Thomson’s “ Hindustani Test” for German use into 
a “ Speisekartentest ” and have worked out carefully directions for 
giving. Michael and Crawford (87) obtain a fair correlation be- 
tween control of voice, particularly inflection, and intelligence. 

The use of intelligence tests in general educational diagnosis and 
in the general system of educational testing in schools is well illus- 
trated by Van Wagenen (135) where the educational record kept is 
always in relation to some mental rating. Goodenough (47) shows 
that high I1.Q.’s have low A.Q.’s because they have been a much 
shorter time in school. Clark (24) gives the distribution of fourteen 
thousand children tested by the N.I.T., together with the distribution 


of their A.Q.’s. 
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EDUCATIONAL TESTS 


BY LAURA KRIEGER anp W. A. McCALL 
Teachers College, Columbia University 


Introduction and History. In the past year there has been a 
marked development in measuring school achievement. This develop- 
ment is evident in the greater refinement and practical applications 
of survey, diagnostic, and practice tests; in the increasing tendency 
toward using objective examinations; and in more refined evaluation 
of the tests and test results. 

The problem of the standardization of educational tests has been 
appreciated for many years. Russell (42) says that in 1864 a report 
was made of an attempt to make standards by means of a “ Scale 
Book.” He also says that in 1897 Dr. Rice established the fact that 
it was both necessary and possible to measure results in education. 
A brief history of educational tests is also given. Kirk (25) 
describes the development of tests in handwriting. 

Russell (42) further says: “ New methods were developed and 
further tests devised, until at the present time there are available 
many tests of various sorts for almost all subjects in practically every 
stage of our educational program, from kindergarten to college. 
Standard tests have enlarged the values of teaching and have reduced 
the errors of examining. They have made comparisons, diagnosis, 
and remedial treatment possible, as well as the implications of 
teaching.” 

Tendencies in Educational Testing. The trends in educational 
testing are various though more or less well defined. Buckingham (5) 
proposes what he thinks are the most significant tendencies exhibited 
by the test movement. 

Tests in music, art, and content subjects are gradually being 
devised since the need for them is becoming evident. Holling- 
worth (23) studied the musical sensitivity of gifted children. 
Karwoski (24) devised an objective test in art appreciation. Mc- 
Conathy (31) studied the musical accomplishment of public school 
children. Wilson (54) describes the Continuity Test in History. 

Educational tests are more and more being considered in the 
classification of pupils. MacPhail (27) discusses the classification 
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of freshmen into sections for instruction in English. Baker (3) 
hopes that psychological and educational tests will eventually revolu- 
tionize methods of classification in Brazil. In discussing the place 
of the principal in the program of measurement Touton (47) states 
that the principal should guide in the classification of his pupils. 

Mead (32) offers several suggestions for the training of teachers 
in the use of educational measurements. Cavins (7) discusses an 
experiment with standardized tests in a state teachers’ examination. 
He emphasizes their value for this purpose. “ The tests permit an 
objective rating of teachers which will permit widespread com- 
parisons. They introduce the teachers in a concrete way to the move- 
ment of educational tests and measurements. They permit a ready 
method of analyzing and diagnosing any individual’s specific and 
general weaknesses in various subjects.” This tendency toward 
acquainting teachers with the uses of measurements by using the 
measurements on them is a most significant one. 

A distinctive tendency is noted in Courtis’ (10) study of the 
influence of certain social factors upon achievement, and the study 
made by Good (21) of the effect of mental set on performance in 
reading. 

There is a definite trend from the general to the specific. | 
means of a survey test the principal or supervisor can find out what 
a pupil’s ability is. However, tests are being devised to aid the 
teacher herself in evaluating and improving classroom instruction. 
Diagnostic tests and practice tests assist the teacher in meeting her 
problems. McCall and Crabbs’ (23) Standard Test Lessons in Read- 
ing is an example of such practice tests in reading. There seems to 
be a tendency to extend the use of practice tests beyond the skill 


By 


subjects to the content subjects. 

According to Russell (42), “To supplement the use of standard 
tests it is necessary for the teacher to devise tests of his own which 
will give results of a kind which can be used at the times between 
giving of standard tests.” The new type of examination, according 
to Popenoe (37), “ provides definite situations to which the pupil 
must react and makes it possible to estimate the worth of his valua- 
tions in a definite manner.” The simple objective examination meets 
the immediate need which standardized tests necessarily cannot meet. 
They measure more accurately and precisely what the old-type exam- 
ination attempts to measure; they lighten the work of both the 
teacher and the pupil, since pupil scoring is possible, and they are 
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more reliable because of their objectivity in scoring. Van Wagenen 
(50) advocates the use of objective examinations in colleges. Chris- 
tensen (8) offers a suggestion to correct for guessing in true-false 
examinations and make results more reliable. McClusky and Cur- 
tis (30) also suggest a modified form of the true-false test. Remmers 
and Remmers (40) find that there is no evidence of negative carry- 
over from the taking of true-false examinations. Buckingham (5) 
says that the new-type examinations are being tried out by various 
examining bodies, e.g., U. S. Civil Service Commission, the College i 
Entrance Examination Board, and the Board of Examiners for the 
New York City Public Schools. 

Tests Applied to Measuring Methods of Teaching. In several 
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studies of school achievement standardized tests were used to measure 
the results obtained. In the past year reading has been emphasized, 
although the trend is gradually toward measuring content subjects. 
Alderman (1), Miles (33), Neal and Foster (34), and Touton = 
and Heilman (46) used the Thorndike-McCall Reading Scale for 





measuring improvement in reading comprehension. Peyton and tk 
Porter (36) used the Haggerty Reading Test Sigma I and the ak ‘ 
Stanford Primary Reading Test at the end of the first year of school 4 
to ascertain whether children taught in a modern way really made it 
more efficient readers than those taught in the older, formal manner. |. 
Gates (15) measured achievement in pronunciation, oral reading, and ik 






















silent reading to compare the results achieved by using a modern 


systematic versus an opportunistic method of teaching. Gates (11) 
measured improvement in reading resulting from methods used in 
teaching deaf children. 

Touton and Heilman (46) used the Hudelson Seven S Spelling 
Scale to study the achievements of high school seniors. Gates (15, 
16) used the Ayres-Buckingham Scale for measuring spelling ability 
in his two studies. 

Gates (15) also tested achievements in penmanship, drawing, and 
arithmetic by means of standardized tests. 

Ashbaugh (2), in studying senior high school English as revealed 


by Tressler’s English Minimum Essentials Test, inferred that because Ws 
the aims and goals in English are usually vague, things have not been ya 


taught effectively. Van Wagenen (5) used three components of his 
English Composition Scale in studying achievement in rhetoric. ; 

Brewington (4) says: “ Measuring and testing achievements in 
all commercial subjects is one of the functions of the Research Com- 
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mittee in Business Education appointed at the National Education 
Association meeting in Philadelphia. In commercial education no 
organization for outlining, centralizing, directing, and stimulating 
measurement and testing has obtained. The appointment of this 
research committee marks the beginning of such an organization.” 

Combining the Results of Mental and Educational Tests. Courtis 
(11) has come to the startling conclusion that children succeed in 
their school work in general accordance with their development or 
maturity. He found that in attempting to predict individual success, 
as measured by the Stanford Achievement Tests, the departure of 
predicted from actual score due to uncontrolled factors may amount 
on the average to as much as 4 per cent. Also, approximately 7 per 
cent of the present errors have been proved to be caused by imper- 
fections in the measuring instruments themselves. Only 3 per cent 
remaining unexplained, represents the combined distinctive effect of 
all other factors, such as teaching ability, social status, emotional 
control, home influence, etc. 

Russell (42) mentions the achievement ratio as a technique to 
“ give to teachers a leverage in teaching ” and which “ makes analyses 
of the achievement of children in terms of their abilities to achieve 
a tool in teaching.” 

Darsie (13) offers an accurate though laborious method of com- 
bining the results of several tests into an educational composite score. 
He objects to former methods as not taking account of the variability 
of the scores obtained, of the comparative reliability and validity of 
the tests, nor of the relative significance of the subjects as factors in 
school progress. 

Buckingham (5) objects to the use of grade scores when educa- 
tional and intelligence tests are brought together to gain a knowledge 
of achievement. 

McCall and Jones (28), in making a comparison of the educa- 
tional progress of bright children in accelerated and in regular classes, 
found that the pupils in special classes grew more during a two-year 
period. Wilson (53) found that of unclassified pupils of the same 
mental ages, those of I.Q.’s of the lower quartile do work more 
nearly up to their possible achievement than those in the middle or 
higher quartiles. Torgenson (45) shows that “ when pupils are given 
educational opportunity equal to their ability, the bright pupils are not 
only as efficient as the dull pupils, but actually tend to have an advan- 
tage over the pupils of the same M.A. but of lower I.Q. 
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In his study of achievement in reading and spelling, Gates (19) 
shows that intelligence ranks second only to a factor called “ word 
perception ” in its association with achievement. 

Use of Tests for Educational Surveys. According to Van Wag- 
enen (49), “ For arriving at reasonable goals of achievement in the 
elementary school subjects for pupils of different degrees of capacity 
and for determining when individual pupils have reached these goals 
and are ready for some new activities, the instruments and methods 
of the achievement survey are finding a constantly widening appli- 
cation, for they represent the best thought of many of the foremost 
leaders in the new science of education.” In the same volume he 
offers a very valuable technique whereby an achievement survey can 
be made by the local school people themselves and by which the 
results can be made more useful for evaluation of instruction and 
classification of pupils. 

The primary purpose of Kirk’s (25) survey was “to determine 
the quality of handwriting necessary to meet the social and business 
demands and to determine therefrom standards of attainment for the 
sixth and eighth grades. 

A survey was made in Brazil by Baker (3) for the main purpose 
of demonstrating the tests, thus stimulating active interest in the same, 
and to collect cases for the standardization of tests in Portuguese. 

Use of Tests for Diagnosis. A greater emphasis on the use of 
tests for purposes of diagnosis is apparent in the past year. Van 
Wagenen (49) says: “ The most important use that can be made of 
educational measurements, at least from a supervisory point of view, 
is an analytical and diagnostic one.” He describes the three main 
functions of the measurement movement as: a selective function, a 
diagnostic function, and an evaluation function. Russell (42) gives 
the requisites of diagnostic tests; viz.: they must be of established 
validity, reliability, objectivity, discriminative power, and general 
administrability. 

Buswell (6), in his studies in arithmetic, emphasizes the fact that 
the desired result in diagnosis is a clear understanding on the part of 
the teacher of just how the pupil does his work in order that more 
effective teaching may follow. Gates (18) reports the preparation 
of a series of tests of reading ability in grades 3 to 8 which consti- 
tute a team or battery for diagnostic purposes. The Gates Primary 
Reading Tests (20) are primarily diagnostic, in character. They 
“measure comprehension in reading in such a way as to reveal spe- 
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cial strengths and weaknesses and thereby to indicate the type of 
training most needed by the pupil.” A diagnostic study of poor 
spellers was made by Witty (56), and of handwriting by West (5). 
Diagnostic studies of failure were reported by Loree (26) in ninth- 
grade science, and by Spencer (43) in algebra. 

Willing (55) and Guiler (22) reported initial diagnostic studies 
in English composition, to aid the teacher in finding out early, more 
economically, and more accurately the weaknesses of her classes and 
in determining what training each of the pupils needs. Guiler says: 
“ Because of the necessity of mass instruction and because of indi- 
vidual variation in achievement and in capacity to learn, diagnostic 
tests and practice exercises become indispensable tools for producing 
definite results.” 

Evaluation and Scoring of Tests. Buckingham (5) summarizes 
the evaluation of tests. He says: ‘* With the development of the test 
movement the demand for an ever-increasing critical evaluation of 
the tests as measuring instruments has become apparent. The test 
makers in seeking to meet this demand have distinguished several 
characteristics which a good test should exemplify. Chief among 
these is the validity of the test. Another way in which authors have 
sought to improve their tests is in respect to reliability.” 

Toops (44) believes that “ failure to measure may arise either 
from failure to make effective use of the available instruments or 
from failure to measure the traits which we need to measure.” He 
states that lengthening tests increases reliability and normally the 
validity also but that beyond three or four hours little is to be gained 
by merely lengthening the test. Corning (9) states that the choice 
of tests should depend not upon cost or brevity but upon the degree 
of dependence which can be placed upon results. 

Current and Ruch (12) found that recent reading tests are more 
reliable than the older ones probably due to greater length in terms 
of working time. “In view of the reliability found, 4 or 5 minute 
tests are not very satisfactory as measuring instruments. Probably 
30 to 40 minutes is the minimum time needed for accurate measuring 
of reading ability.’ They state, however, that 4 or 5 forms of 
shorter reading tests would yield reliabilities equal to the best 
tests studied. 

Ruch and Degraff (41) report an order of decreasing validity for 
the four following techniques in giving directions and scoring True- 
False tests: instructions not to guess and chance corrections, instruc- 
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tions to guess and chance corrections, instructions not to guess with- 
out chance corrections, and instructions to guess without chance 
corrections. Wood (57) also advocates that students should be 
advised not to guess and that the scores should be rights minus 
wrongs. He believes that the essential character of a question 
depends upon its content as well as upon its external form and that 
more attention must be paid to the subtleties of questions in order 
to improve the validity of tests. He comes to the conclusion that 
“all the data show a crucial need for increasing the validity and 
reliability of our educational measurements. . . . The needed 
improvement will be hastened by more care in the construction of 
individual questions, by drastically lengthening our examinations, and 
by using a greater variety of appropriate question-forms in them.” 

The standard deviation as a score unit is advocated by Will- 
son (52). Russell (42) discusses reliable composite scores. The 
teacher who needs a very simple introduction to the statistics of tests 
will find the manual by Pressey and Pressey (38) of value. A very 
interesting apparatus which automatically gives and scores a test and 
“which will also, automatically teach—and teach informational and 
drill material more efficiently, in certain respects, than the * human 
machine ’” is a machine described by Pressey (39). 

A bibliography on standardized tests for secondary schools is 
described by Touton (48). 
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PERSONALITY AND CHARACTER TESTS ? 
BY MARK A. MAY, HUGH HARTSHORNE anp RUTH E. WELTY 


Teachers College, Columbia University 


In this bibliography the titles are classified in the same way as 
in the one of last year, except that we have added two new sections, 
one on rating scales and one on experiments which, though quanti- 
tative, do not involve tests. In addition to the titles of the year 1926 
we have added a few that were omitted from the last report. Gen- 
eral discussions are omitted unless they refer specifically to measure- 
ments or measuring techniques. 

A. Summaries. Summaries of special phases or parts of the field 
have been made by Brandenburg (14) on character analysis by means 
of certain structural measurements; by Cleeton (26) on originality; 
by Clara Chassell Cooper (27) on habit formation in character; by 
Dodge (33,34) on inhibition; by Yoakum and Manson (149) on 
self-ratings; by Barr (5) on measurement in civics; by Bingham (7) 
on personality traits pertaining to vocations; by Bingham and 
Freyd (8) on personality traits in industry. Wider ranges of titles 
are given in summaries by Haggerty (55) on scientific methods in 
character studies; by Leta S. Hollingworth (63) on character, tem- 
perament and interests of gifted children. Viteles (137) has a sum- 
mary of psychology in industry; von Bracken (13) lists 120 titles, 
mostly German, on experimental techniques in studying character. 
The Committee on Character Education of the National Education 
Association (23) lists 153 titles on character study. Summaries 
adhering more rigidly to measurements are by May and Hart- 
shorne (91), Watson (140), Whaley (146), Freeman (45) and All- 
port (3). A very complete bibliography including 1,364 titles and 
covering almost everything in the field is Manson’s (89). 


B. Batteries Including Various Assemblages of Tests Intended 
to Measure More Than a Single Trait. 

Moss et al. (97, 98,99) have produced a social intelligence test 
composed of six tests: memory for names and faces; judgment of 


1 This bibliography has been prepared in connection with an Inquiry in 
Character Education made possible by a grant to Teachers College from the 
Institute of Social and Religious Research. 
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social situations ; recognition of mental states from facial expression ; 
observation of human behavior; social information; recognition of 
the mental state of the speaker. A similar battery intended to 
measure sociability by Gilliland and Burke (52) is in four parts, three 
using photographs and one a questionnaire. A battery containing 
a mixture of intelligence and personality tests was used by Gal- 
lup (47) in testing successful retail sales people. 

The Y.M.C.A. Character Growth Tests (24) present the most 
complete series yet produced of paper and pencil tests for measuring 
various mental skills and contents related to ethical behavior, offering 
three forms for ages 12-14, 14-18, and 18 up, respectively. Each 
test appears in two forms. A great variety of techniques is used. 
No critical studies on these tests have yet appeared. 


C. Tests and Techniques Intended Primarily to Measure Objec- 
tively (and Mainly in Terms of Conduct) Certain Personality Traits 
and Types of Behavior. 

1. Aggressweness. Gilliland (51) has revised the Moore-Gilli- 
land aggressiveness technique by adding certain new tests and dis- 
carding others. He reports a correlation of .26 with ratings. 
Riddle (112) has attempted to measure “ bluffing ” by a standardized 
procedure in poker playing. 

2. Deception. The reaction time technique for detecting honesty 
employed by Marston and Goldstein has been used by English (42) 
getting results in agreement with Goldstein. Rich (111) has a 
technical discussion of this technique. Landis and Wiley (80) tried 
out Larson’s cardio-pneumo-psychogram technique with more refined 
methods. Their jury could detect deception in a few more cases 
than chance would provide. They conclude, however, that this does 
not necessarily invalidate Larson’s results since court room condi- 
tions are different from laboratory conditions. A technique for 
detecting dishonesty in classroom work is reported by Persing (103) 
who deliberately and systematically graded certain papers too high 
or too low over a period of two years. Papers were returned to 
the students for checking. Ninety-seven per cent reported grades 
that were too low, and 9.5 per cent grades that were too high. Two 
further techniques for detecting deception in classroom work are 
reported by Chambers (20). First he observed the students with a 
mirror from an unseen position to see if they copied from one another. 
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Second, on another occasion, he observed the ones who would turn 
the page to see the answers to words to be defined. This was further 
checked by giving the test a third time with no chance to cheat. 

3. Carefulness. In his tests for taxicab drivers Wechsler (143) 
has a technique for testing carefulness principally by measuring 
reaction times to complicated stimuli. 

4. Persistence. Morgan and Hull (96) have measured one type 
of persistence by using a pencil maze, part of which is covered. The 
task is varied and may be increased in difficulty all the way from a 
very easy problem to an impossible one. Subjects are scored on the 
length of time they work and how they work. On this basis they 
are put in one of nine categories running from “ careless—anxious 
to quit’ at one extreme to the “analytical type—intelligently per- 
sistent to the extent that he fully analyses the problem,” at the other. 
He reports very high correlations with ratings, getting .842 between 
the test and combined rating of judges (using only the twenty highest 
and twenty lowest cases, however). 

5. Resistance. Levy (82) has studied changing tendencies to 
cooperate with the examiner in intelligence tests. Graphs show low 
resistance at six months, high peak at thirty to thirty-five months, 
falling to low level at fifty-four months. Sex differences are noted. 

6. Social Perception. Measured by ability to judge emotions 
from photographs, or the mental state exhibited in the photograph. 
One such test is in the Moss (99) battery. A variation of this is 
reported by Jarden and Fernberger (70) who used on one occasion 
the Titchener-Boring (Pideret) model and on another occasion the 
examiner pantomimed the facial expression. Their aim was to show 
the influence of suggestion on such judgment and they conclude that 
suggestion plays a very important role here. 

6. Social Recognition. A still different technique appears as one 
of the Social Intelligence Tests-of Moss et al. (99) and in the 
Gilliland and Burke tests (52). Here the recognition type of experi- 
ment is used; the subject is shown pictures of faces with the names 
and later without the names. The task is to name the faces. 
Rice (110) has tried to measure one source of error in judging char- 
acter from photographs. The error is that of “ stereotypes” and is 
shown by the degree to which estimates of intelligence and craftiness 
are influenced by the supposed identity of the pictures. It is really 
a measure of prejudice. 
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D. Tests and Testing Techniques Intended to Measure Primarily 
the Affective Aspects of Personality. 

I. Instincts and Emotions. 

a. Laboratory Techniques for Measuring the Relative Strength 
of Emotions and Instincts. 

The relative strength of emotions or instincts has been measured 
by oxygen consumption during emotional stimulation, by Tot- 
ten (133). In six of her thirteen emotional situations she found 
increases in oxygen consumption from 4.9 per cent to 25 per cent. 
Two papers have appeared on efforts to measure the physiological 
effects of severe emotional shock in terms of several physiological 
changes, one by Landis (79) and one by Skaggs (119). A paper 
from the Department of Education of the Japanese Government (40) 
reports urine analyses of 831 elementary school children who had 
been subjected to emotional strain in a competitive examination. 
Marked increase in protein and sugar was found. An elaborate 
study of milder emotional states by the use of plethysmographic and 
pneumographic records is reported by Eng (41). The psycho- 
galvanic index is used by Syz who in one study (130) reports the 
number of galvanic reactions as a measure of emotional stability and 
in another paper (129) recounts the failure of introspective reports 
to accord with galvanic responses. -.A kind of an emotional confusion 
test is reported by Snow (121) as a part of his tests for taxicab 
drivers. The aim is to determine the extent to which the subject is 
confused or delayed or forgets in operating a complicated set of 
levers and lights in a dark room. 

b. Paper and Pencil Tests of Emotionality and Emotional 
Instability. 

Data on the Pressey X-O tests are presented by Guilford (54) 
in a study of emotional tendencies of criminals. The results are 
negative except for certain details. Weber and Guilford (142) 
found that ninety reformatory men are below normal in affectivity 
and above normal in idiosyncrasy. Other results are in accord with 
the previous experiment. Tjaden (132) used the X-O to compare 
delinquent boys with those of superior intelligence. Results are 
mostly negative. No significant differences were found in the total 
but there were interesting differences in reactions to certain words. 
Bond (11) gave the X-O along with the Terman Group, and the 
Downey Will-Temperament to 175 negro college students and found 


eee eee 
ee : 





Ses aes 


OPE RE 


hon ge 


<i 


§ 
“ed 
+2 








a 
i 


ie igthersinet Ng 


Sie 5 a i 


ae 


we ek 


sa 


reeta 


19.0 ache 












422 MARK A. MAY, HUGH HARTSHORNE, RUTH E. WELTY 





that negroes were 25 per cent below the white norms in affectivity, 
suggesting less emotionality. 

Two studies using the Colgate Mental Hygiene Tests (77) have 
been reported. One by Laird (78) using the extremes of responses 
to certain questions to measure introversion or extroversion tend- 
encies ; the other by Stuart (127) on the emotional symptoms of the 
only child, finding a slight tendency for emotional symptoms to 
increase with size of family. 

Abel (1) reports results on students of different nationalities 
using a questionnaire one part of which is the Woodworth Personal 
Data Sheet, the other parts relating to interests, fears and beliefs. 
She found what she believes to be national differences in sources of 
pleasure, fear and anger. 

c. Records as Basis for Study of Affective Differences. 

The method of having subjects keep detailed records of their 
affective experiences and reporting later to the laboratory has been 
used by Fligel (44), Gates (50) and Stratton (124). Details as to 
time, place, physical conditions, cause, responses made, and impulses 
felt were recorded by Gates’ and by Stratton’s subjects. This method 
seems to be practical, to have reasonable reliability, and to give 
promise of affording significant data on the relation of emotions to 
physical conditions. 


II. Mood and Temperament. 

Three studies on temperamental traits have appeared from the 
Vassar Laboratory. In one (139) Washburn and her students con- 
tinue their studies in emotional and calm temperaments using as 
measures the galvanic reflex, instances of joy, anger or fear recalled, 
and the Downey tests. In another (138) the measures are reaction 
times in the recall of a pleasant or unpleasant idea, and the number 
of stimulus words suggesting such ideas. The third study is by 
Kambouropoulou (73) on the sense of humor. 

Downey will-temperament studies have been reported by 
Wires (147) who found these tests unsatisfactory as measures of 
personality traits in psychopathic delinquents; by Hurlock (69) who 
found the group will-temperament tests unsuited for children; by 
Branham (17) who used them along with the Pressey X-O and other 
tests and found that they aided little in the classification of defective 
delinquents; by Stoddard and Ruch (123) who found low correla- 
tions between test scores and the composite ratings of self and two 
associates, also that student ratings were more consistent than test 














PERSONALITY AND CHARACTER TESTS 423 




































ratings, that students could not identify their own test profiles. iH 
Flemming (43) got correlations of —.01 to +.26 between Downey a 
tests and school achievement. a 
III. Attitudes, Interests, Preferences, Prejudices, etc. - 

a. General Attitudes and Interests. is 
Zeleny (150) gives a reliability of .89 on a standardized question- ‘ 
naire for measuring changes in student opinion on social questions. He: 
a 


Narayana Sastry (100) has attempted to distinguish verbal from a 
pictorial preferences in children by their galvanic reflexes. Smith 
(120) reports an effort to measure racial preferences by testing 
1,200 Institute of Geneva and Italian children with a variety of 
pictures arranged either symmetrically or non-symmetrically on cards. 

An information test aimed to measure general social orientation in 
juvenile delinquents is reported by Eastman (38). It consists of 100 
questions of general social information. 

b. Specific Attitudes and Interests. 

1. Attitudes toward Deception. May and Hartshorne (90) 
report a technique for scaling attitudes toward specific types of 
conduct by measuring the resistance the subject will overcome to 





indulge in the behavior in question. 

2. Conservative-radical attitudes. A questionnaire study of the 
attitudes of college students on political, economic, religious, domestic 
and moral questions has been made by Lundberg (84) and a similar 
study by Jones (72). 

3. Cribbing. The attitude of 1,500 college students on cribbing 
obtained with an anonymous questionnaire is reported by Katz (74). 
As a part of a larger reaction study certain questions on the preva- 
lence of cribbing, whether the student himself cribbed, if so how 
often and why, and the relation of grading to cribbing, were asked. 
About 50 per cent admitted cribbing in greater or less degree; about 4 
40 per cent condoned or rationalized cribbing in one way or another ; 





Bers 4 


40 per cent thought that a student could not expect to get a fair 
grade on more than half of his courses; also 39 per cent said that 
unfair grading was the greatest cause of cheating. 

4. Racial Attitudes. Busch (18) shows the dependence of chil- 
dren on adult prejudices by using a three-part questionnaire on 600 ‘4 
children. Lehman and Witty (81) have studied the extent to which iy 
white and negro children will participate with other children in play 
activities by giving the Lehman play quiz to 6,000 Kansas City school 
children. The negro children are more social in their play than white 
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children. Neumann (101) has adapted the Hornell Hart and other 
techniques in a carefully planned study of the race or international 
attitudes of 1,110 high school pupils. Social areas are located where 
attitudes and values are significantly different from those of other 
areas. 

5. Social Distance. Studies using the social distance technique 
have been used by Binnewies (9) to measure rural social distance ; by 
Bogardus (10) to study personality clashes; by Poole (105) to meas- 
ure all sorts of social distance. 

c. Tests of Specific Interests. 

Five studies using Freyd’s occupational interest blanks have been 
reported. Cowdery (28) has measured the professional attitudes of 
34 doctors, 37 engineers, and 34 lawyers. He reports high reliability 
coefficients and a high degree of efficiency of the scale for this pur- 
pose. Two similar studies are reported by Strong (125, 126), who 
has applied it to personnel managers. Hubbard (65) has applied it 
to students to find if it would differentiate those of different voca- 
tional aims. In another paper (66) she endeavors to determine the 
reliability of the scale, getting reliability coefficients of .47 to .64. 
Miner (95) has worked out a technique for measuring the values 
of various types of interest for different occupations. Each group 
of interests has a different value to different vocations. He proposes 
a way of determining these values statistically. 


E. Tests and Techniques Intended to Measure Primarily Social- 
Ethical Ideas and Judgment. 

I. Tests Requiring the Ranking or Rating of Situations. 

Weber (141) has compared university women’s ranking of the 
Brogan list of bad practices, with the rankings by reformatory women, 
and found no differences. Snyder and Dunlap (122) propose as a 
measure of moral judgment a list of 100 acts. The subject marks 
the best plus ten and the worst minus ten, and the neutral ones zero 
and all others on the same scale, but in comparison with the three 
picked as the two extremes and the neutral. They applied the test 
to 78 undergraduates. Tanaka (131) has used the same scale with 
359 high school pupils to compare moral values of boys and girls. 
Sexes agree well except in the matter of personal conduct. 

Quadfasel (108) has applied the Fernald ethical discrimination 
test to 778 school children and 50 adults, and reports it as rather 


unsatisfactory. Pitkin (104) had the ten commandments ranked in 
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the order of their importance by 500 men and women between twenty 
and sixty years of age and from the results concludes that 68 per cent 
of our best people are modernists, 12 per cent fundamentalists, and 
20 per cent moral socialists. 

Woodrow (148) reports a picture-preference test consisting of 
forty-four pictures, four on a page, eleven pages. Each set of pictures 
shows a graduated series of responses to a social situation. The 
pupils rank the four pictures on each page according to their prefer- 
ence for them as pictures. He reports a reliability of .79 and a cor- 
relation with mental age of .48, and with character ratings around 
40, but with significant differences between the top and bottom thirds 
of three groups. 


IJ. Tests Requiring Various Sorts of Responses to Imagined 
Situations. 

Persing (103), who also reports a technique for detecting dis- 
honesty, used a paper and pencil test in which the subjects told what 
they would do in situations involving honesty, and found no agree- 
ment between these responses and actual conduct. Garrett and 
Fisher (49) have a true-false test to measure popular misconceptions. 
They found no correlation between misconceptions score and intel- 
ligence. 

Patrick (102), in his study of negroes and whites, used a compre- 
hension test to measure ideals, in which he required a response to 
each answer. The answers were weighted from +5 to—5. Negroes 
differed from whites significantly in intelligence but not in ideals as 
thus measured. Complete test appended. 

Two word studies are reported by Schwesinger, one a social 
ethical vocabulary test (116) consisting of two forms of 200 words 
each having social implications, the other a study of slang (115) as 
an indicator of delinquent tendencies. In each case positive relations 
between vocabulary and conduct are shown. 

A series of five articles on Testing the Knowledge of Right and 
Wrong has been published by Hartshorne and May et al. (56, 57, 58, 
59,60), showing the steps taken in building and refining the test, and 
certain applications of it to problems of character study. 


F. Ratings. 

I. Techniques and Applications. 

Studies on rating techniques are reported by Arlitt and Dowd (4), 
who found that variability of judgments does not diminish with the 
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training of the judges or with their acquaintance with the children. 
Dorcus (35) has studied the time taken by judges in ratings and 
found that longer time is taken in rating the undesirable traits. Fur- 
fey (46) reports increase in reliability of rating scales by subdividing 
the traits and transmuting the scores into sigma values for combina- 
tions. Hepner (62) has compared the graphic method with the 
method of checking adjectives and finds the latter more satisfactory. 

A wide variety of uses has been made of rating scales. Korn- 
hauser (75) discusses their uses in general and lists their advantages, 
Schutte (114) suggests their use to aid teachers in locating their 
strong and weak points. Alger (2) tells how ratings were used to 
secure further information on applicants for college admission. 
Heidbreder (61) used the Freyd Scale of paired traits to study 
introversion-extroversion. Hughes used rating to secure data on the 
relation between certain traits and academic success (67) and intel- 
ligence levels (68). Porteus and Babcock (106) adapted the Porteus 
Social Rating Scale to their studies in racial differences. Powers 
(107) used ratings for the classification of mental defectives, Bran- 
denburg (15,16) for studying the relation between personality, char- 
acter, and vocational success, and Earle (37) for the study of the 
relation between intelligence and personality traits. Flemming (43) 
used ratings as part of a general scheme for measuring high school 
success and for classification in high school. Charters (25) developed 
a scale of traits for homemakers. 

Cox et al. (29) applied a seven-point scale with sixty-seven char- 
acter traits to 100 historical cases selected from Cattells’ list, as part 
of the Stanford Genetic Studies of Genius. The atypicality of both 
the specific characteristics and the total personality of the genius is 
reported as recognizable from childhood. 


II. Self-Ratings. 

Wells (145) reports further data from his study of personality 
traits of a college group with a self-rating questionnaire on introvert 
and extravert tendencies. Downey (36) asked members of the Amer- 
ican Psychological Association to rate themselves for extroversion- 
introversion, dextrality, and on nervous and emotional stability. 

Yoakum and Manson (149) have made a study of repeated self- 
ratings, using on different occasions trait words of similar meaning. 
They got reliability coefficients from .35 to .63. Uhrbrock (136) 
had 186 men and 151 women rate themselves on paired traits to see 
if there were sex differences and to determine for which pairs the 
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group was significantly dichotomous. He derives a list of norms for 
further comparisons. Rice (109) had college students rate them- 
selves on a radical-conservative-reactionary scale of eight steps before 
the November election of 1924. They each voted a straw ballot at 
the time, which was later compared with their actual votes at the 
election; 77 per cent remained unchanged. Garrett (48) devised a 
self-rating chart for habit systems to secure a “ personality ” score 
and an “integration” score. The results correlate rather low with 
Alpha score and with the Moss Social Intelligence Score. 


é 


G. Experiments Involving Quantitative Studies. 

I. Trait Studies. 

1. Confidence. The criteria of confidence have been analyzed by 
Lund (83) in a careful laboratory experiment. 

2. Deception. Three studies of children’s lies have appeared. 
von Baumgarten (6) discusses the relation of lies to pity, shame, and 
other traits. Meyer (94) has studied the kinds of lies and their 
frequency and suggests methods of prevention. Tudor-Hart (134) 
has studied the replies of school children to the question, “Are there 
cases in which lies are necessary?” and found a wide variety of 
responses. 

3. Leadership. Chapin reports two studies (21,22) on relations 
between activities, grades, and physical condition in college students. 
Student office holders have been compared in height, weight, years in 
school, scholarship, behavior, etc., with non-office holders by Rohr- 
back (113). Caldwell and Wellman (19) have made a similar study 
with boys and girls in grades seven to nine who had been chosen lead- 
ers in school activities, finding that characteristics of leaders vary 
with the activity which they lead. Bowden (12) has made another 
such study, using the Allport technique (J. Abn. and Soc. Psych., 
1921) for measuring total personality. He tested forty presidents 
of student councils in forty colleges. He reports the data in profiles. 

4. Popularity. Dexter (32), using a questionnaire with college 
students, tried to discover the causes of the popularity of courses, 
teachers, and fellow students. Wellman (144) has studied the char- 
acter resemblances of friends, counting the number of times children 
were seen together as an indication of strength of friendship. 


Il. Studies of Moral and Social Concepts and Ideals. 


Johnston (71) has investigated the moral judgment of 329 stu- 
dents in Glasgow Training College by having them react to five moral 
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judgments. He compared snap judgments with judgments after 
deliberation. Macaulay (86) and Macaulay and Watkins (87, 88), in 
three articles, report investigations of the moral concepts and ideals 
of a large number of children. The method was to ask one or two 
questions and have the subjects make lists of wicked things or indi- 
cate the historic figure who is most ideal, etc. Studencki (128) used 
trick questions such as, “A boy or girl has thought, ‘1 am very bad.’ 
Tell me why he has thought so.” He found that home and school 
influences predominate up to thirteen years of age. 

Meltzer (93) measured the relation between “ talkativeness ” and 
knowledge and finds a correlation of +.31 for all subjects, but +.16 
for fifth grade, +.50 for ninth grade, and +.65 for twelfth grade. 
r between National Intelligence Score and talkativeness is .36, be- 
tween the number of ideas explained and talkativeness .69. In (92) 
he reports an elaborate and critical analysis of children’s social con- 
cepts. Searles (117) has made an extensive inquiry into college 
students’ ideas of God. 


III. MisceWaneous Studies. 
Lundberg (84) has made an effort to determine the influence of 
the press, by finding out people’s opinions and the papers they read. 


Dexter (31) has made out a fairly good case for influence of weather 
on conduct. Crawford (30) has studied the relation between intel- 
ligence and achievement of scholarship applicants at Yale and finds 
that this increased motivation tends to increase the correlation. 
Eells (39) has studied the reason for church attendance by college 
students. Sengupta (118) finds that the presence of others looking on 
facilitated such work as cancellation of A’s and E’s. Kretschmer (76) 
reports an elaborate study of the relation between certain physical 
characteristics or physiques and certain temperamental types. 
Graves’ work (53) is an interesting contrast to Kretschmer’s, showing 
the indicative value of a single anatomical configuration (shoulder 
blades) for biological fitness. Hoyland (64) gave a questionnaire to 
1,164 children in India to compare Indian children with western 
children in various ethical and social traits. Uhrbrock (135) studied 
the relation between interest and ability of 253 college men. Mean 
intelligence scores on the college entrance examination are reported 
as follows: For those interested primarily in ideas, 68.7; in things, 
64.8; in people, 61. 
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